AITopics | few-shot audio-visual learning

Few-Shot Audio-Visual Learning of Environment Acoustics Supplementary Material

Neural Information Processing SystemsApr-24-2026, 15:34:30 GMT

In this supplementary material we provide additional details about: Video (with audio) for qualitative illustration of our task and qualitative evaluation of our model predictions (Sec. Evaluation of the impact of the query source location on our model's prediction quality for a fixed receiver (Sec. Moreover, we qualitatively demonstrate our model's prediction quality by comparing the predictions with the ground truths, both at the RIR level and in terms of perceptual similarity when the RIRs are convolved with real-world monaural sounds, like speech and music. We also analyze common failure cases for our model (Sec. Please use headphones to hear the spatial audio correctly.

artificial intelligence, few-shot audio-visual learning, machine learning, (13 more...)

Neural Information Processing Systems

Country: Europe > France (0.14)

Technology:

Information Technology > Sensing and Signal Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Few-Shot Audio-Visual Learning of Environment Acoustics

Neural Information Processing SystemsDec-23-2025, 19:06:23 GMT

Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics. Whereas traditional methods to estimate RIRs assume dense geometry and/or sound measurements throughout the environment, we explore how to infer RIRs based on a sparse set of images and echoes observed in the space. Towards that goal, we introduce a transformer-based method that uses self-attention to build a rich acoustic context, then predicts RIRs of arbitrary query source-receiver locations through cross-attention. Additionally, we design a novel training objective that improves the match in the acoustic signature between the RIR predictions and the targets. In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs, outperforming state-of-the-art methods and---in a major departure from traditional methods---generalizing to novel environments in a few-shot manner.

environment acoustic, few-shot audio-visual learning, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Few-Shot Audio-Visual Learning of Environment Acoustics

Neural Information Processing SystemsOct-9-2024, 18:31:54 GMT

Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener, with implications for various applications in AR, VR, and robotics. Whereas traditional methods to estimate RIRs assume dense geometry and/or sound measurements throughout the environment, we explore how to infer RIRs based on a sparse set of images and echoes observed in the space. Towards that goal, we introduce a transformer-based method that uses self-attention to build a rich acoustic context, then predicts RIRs of arbitrary query source-receiver locations through cross-attention. Additionally, we design a novel training objective that improves the match in the acoustic signature between the RIR predictions and the targets. In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs, outperforming state-of-the-art methods and---in a major departure from traditional methods---generalizing to novel environments in a few-shot manner.

environment acoustic, few-shot audio-visual learning, traditional method, (1 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Charge > Source (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.44)

Add feedback

Filters

Collaborating Authors

few-shot audio-visual learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Few-Shot Audio-Visual Learning of Environment Acoustics Supplementary Material

Few-Shot Audio-Visual Learning of Environment Acoustics

Few-Shot Audio-Visual Learning of Environment Acoustics